摘要 :
Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance...
展开
Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation. In this work, we propose a pixel-wise contrastive learning method called CP~2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. In detail, we copy-paste a random crop from an image (the foreground) onto different background images and pretrain a semantic segmentation model with the objective of 1) distinguishing the foreground pixels from the background pixels, and 2) identifying the composed images that share the same foreground. Experiments show the strong performance of CP~2 in downstream semantic segmentation: By finetuning CP~2 pretrained models on PASCAL VOC 2012, we obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S.
收起
摘要 :
Image pre-training, the current de-facto paradigm for a wide range of visual tasks, is generally less favored in the field of video recognition. By contrast, a common strategy is to directly train with spatiotemporal convolutional...
展开
Image pre-training, the current de-facto paradigm for a wide range of visual tasks, is generally less favored in the field of video recognition. By contrast, a common strategy is to directly train with spatiotemporal convolutional neural networks (CNNs) from scratch. Nonetheless, interestingly, by taking a closer look at these from-scratch learned CNNs, we note there exist certain 3D kernels that exhibit much stronger appearance modeling ability than others, arguably suggesting appearance information is already well disentangled in learning. Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels. In addition, we propose Spatial-Temporal Separable (STS) convolution, which explicitly splits the feature channels into spatial and temporal groups, to further enable a more thorough decomposition of spatiotemporal features for fine-tuning 3D CNNs. Our experiments show that simply replacing 3D convolution with STS notably improves a wide range of 3D CNNs without increasing parameters and computation on both Kinetics-400 and Something-Something V2. Moreover, this new training pipeline consistently achieves better results on video recognition with significant speedup. For instance, we achieve +0.6% top-1 of Slowfast on Kinetics-400 over the strong 256-epoch 128-GPU baseline while fine-tuning for only 50 epochs with 4 GPUs.
收起
摘要 :
Existing commonsense knowledge bases often organize tuples in an isolated manner, which is deficient for commonsense conversational models to plan the next steps. To fill the gap, we curate a large-scale multi-turn human-written c...
展开
Existing commonsense knowledge bases often organize tuples in an isolated manner, which is deficient for commonsense conversational models to plan the next steps. To fill the gap, we curate a large-scale multi-turn human-written conversation corpus, and create the first Chinese commonsense conversation knowledge graph which incorporates both social commonsense knowledge and dialog flow information. To show the potential of our graph, we develop a graph-conversation matching approach, and benchmark two graph-grounded conversational tasks.
收起
摘要 :
In this paper, the distributed cooperative jamming task of multiple unmanned aerial vehicles(multi-UAVs) for multiple targets is studied. In view of the mapping relationship between the biological behavior mechanism and the cooper...
展开
In this paper, the distributed cooperative jamming task of multiple unmanned aerial vehicles(multi-UAVs) for multiple targets is studied. In view of the mapping relationship between the biological behavior mechanism and the cooperative jamming task of multi-UAVs, the circle formation control method of multi-UAVs and the self-organizing method of dynamic resource allocation based on the improved biological pheromone(biopheromone) mechanism are proposed. The UAV continuously adjusts its position in accordance with the real-time order in jamming orbit to realize uniform control. Compared with the traditional pheromone mechanism, the improved biopheromone mechanism contains both attracting and inhibiting pheromone with the positive and negative feedback, and the pheromone contains the information of the targets. Each UAV carries a pheromone map, and makes decisions autonomously according to the type and concentration of pheromone. With the distributed control structure and the limited distance communication between multi-UAVs that incorporate the improved biopheromone, each UAV can adapt itself to the changes of the internal and external environment. The convergence performance, external response capability and internal scalability of the method under different conditions are simulated and analyzed, the multi-UAVs system can respond quickly and achieve the jamming task.
收起
摘要 :
In this paper, the distributed cooperative jamming task of multiple unmanned aerial vehicles(multi-UAVs) for multiple targets is studied. In view of the mapping relationship between the biological behavior mechanism and the cooper...
展开
In this paper, the distributed cooperative jamming task of multiple unmanned aerial vehicles(multi-UAVs) for multiple targets is studied. In view of the mapping relationship between the biological behavior mechanism and the cooperative jamming task of multi-UAVs, the circle formation control method of multi-UAVs and the self-organizing method of dynamic resource allocation based on the improved biological pheromone(biopheromone) mechanism are proposed. The UAV continuously adjusts its position in accordance with the real-time order in jamming orbit to realize uniform control. Compared with the traditional pheromone mechanism, the improved biopheromone mechanism contains both attracting and inhibiting pheromone with the positive and negative feedback, and the pheromone contains the information of the targets. Each UAV carries a pheromone map, and makes decisions autonomously according to the type and concentration of pheromone. With the distributed control structure and the limited distance communication between multi-UAVs that incorporate the improved biopheromone, each UAV can adapt itself to the changes of the internal and external environment. The convergence performance, external response capability and internal scalability of the method under different conditions are simulated and analyzed, the multi-UAVs system can respond quickly and achieve the jamming task.
收起
摘要 :
The increasing credit card consumption makes the security of keypad input become a problem that cannot be ignored. We propose a novel keystroke recognition system called WiKey. When the user enters the password on the keypad with ...
展开
The increasing credit card consumption makes the security of keypad input become a problem that cannot be ignored. We propose a novel keystroke recognition system called WiKey. When the user enters the password on the keypad with his/her fingers, the posture and position of different keystrokes will introduce a unique interference to the multi-path signals, which can be reflected by the Channel State Information. After analysis of the fluctuation of the CSI waveform between two keystrokes, we find that there is a strong correlation between the distance of finger movement and the shape of the waveform. We exploit the association to infer the user's number input. Compared with the previous approaches of keystroke inference, the use of auxiliary information improves their cognition accuracy. We implemented the WiKey in the normal Point Of Sale. The results of experiment show that the average accuracy rate is about 90%, which are 5-10% higher than the rate of the previous keystroke inference approaches.
收起
摘要 :
The increasing credit card consumption makes the security of keypad input become a problem that cannot be ignored. We propose a novel keystroke recognition system called WiKey. When the user enters the password on the keypad with ...
展开
The increasing credit card consumption makes the security of keypad input become a problem that cannot be ignored. We propose a novel keystroke recognition system called WiKey. When the user enters the password on the keypad with his/her fingers, the posture and position of different keystrokes will introduce a unique interference to the multi-path signals, which can be reflected by the Channel State Information. After analysis of the fluctuation of the CSI waveform between two keystrokes, we find that there is a strong correlation between the distance of finger movement and the shape of the waveform. We exploit the association to infer the user's number input. Compared with the previous approaches of keystroke inference, the use of auxiliary information improves their cognition accuracy. We implemented the WiKey in the normal Point Of Sale. The results of experiment show that the average accuracy rate is about 90%, which are 5-10% higher than the rate of the previous keystroke inference approaches.
收起
摘要 :
Based on sound radiation theory of submerged stiffened cylindrical shells excited by interior source, a finite stiffened shell is numerically studied in the present paper. The interior mean quadratic sound pressure level, the radi...
展开
Based on sound radiation theory of submerged stiffened cylindrical shells excited by interior source, a finite stiffened shell is numerically studied in the present paper. The interior mean quadratic sound pressure level, the radiated sound power level and the mean quadratic velocity level are calculated. The effect of cavity-shell coupling on underwater sound radiation from the shell excited by an interior harmonic point source is analyzed; the effect of the point source’s radial position and axial position in sound cavity on sound radiation from cylindrical shell is discussed; the sound radiation characteristics of a shell excited by two point sources with different phases are discussed. After calculation and analysis, the following conclusions can be obtained: The coupling of sound cavity and shell has little effect on shell’s vibration, interior sound field and sound radiation characteristics when the shell is full of air; change of point source’s position has little effect on radiated power, however, radial position has greater effect on mean quadratic velocity and interior mean quadratic sound pressure than axial position; the nearer to shell surface the point source is, the higher the shell’s mean quadratic velocity is; two point sources radically symmetric with same or opposite phrases have little effect on shell’s sound radiation characteristics, while they will have great effect if they are symmetric about the midsection of the shell. This provides theoretical basis for calculating characteristics of vibration and sound radiation from stiffened cylindrical shell excited by interior sound sources more quickly and guidance for placement of interior sound sources.
收起